Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize applyFromArray by caching existing styles #1785

Merged
merged 4 commits into from
Oct 30, 2021

Conversation

eigan
Copy link
Contributor

@eigan eigan commented Jan 10, 2021

This is:

- [x] a new feature

Checklist:

Why this change is needed?

Prevent calling clone and getHashCode when not needed
because these calls are very expensive.

When applying styles to a range of cells can we cache the
styles we encounter along the way so we don't need to look
them up with getHashCode later.

With these changes is clone and getHashCode only called on unique styles in the set of cells being updated.

Profiling

Memory usage

Since we cache the hashcode for each unique style in the range of cells we update, will sheets with many styles use a bit more memory than before. Though I don't think it should be that much.

Before

~ 0.90s

After

~ 0.28s

Fixes #1784

@eigan eigan force-pushed the optimize-apply-from-array branch 2 times, most recently from 7c0d34c to 827662c Compare January 11, 2021 07:37
@eigan
Copy link
Contributor Author

eigan commented Feb 22, 2021

Any comments @MarkBaker?

@eigan eigan force-pushed the optimize-apply-from-array branch from 827662c to fd83918 Compare April 8, 2021 08:38
@eigan
Copy link
Contributor Author

eigan commented Apr 8, 2021

I am rebasing and fixing the checks and conflicts whenever we are upgrading this dependency in our project. Let me know if you are going to do a review.

@stale
Copy link

stale bot commented Jun 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If this is still an issue for you, please try to help by debugging it further and sharing your results.
Thank you for your contributions.

@stale stale bot added the stale label Jun 26, 2021
@MarkBaker MarkBaker removed the stale label Jun 30, 2021
@eigan eigan force-pushed the optimize-apply-from-array branch 2 times, most recently from d3c7ef4 to 2ae2d7a Compare July 14, 2021 08:48
@eigan
Copy link
Contributor Author

eigan commented Sep 19, 2021

@MarkBaker @oleibman anything wrong with this pull request? Don't get why you guys never respond but still the repo is fully active. Seems to be over 200 other pull requests merged since this PR was opened.

@oleibman
Copy link
Collaborator

My apologies, I do not have sufficient expertise in the area of PHP performance to properly evaluate your change. I will try to read up on it, but it will take a while.

@eigan eigan force-pushed the optimize-apply-from-array branch from 2ae2d7a to de8aaef Compare September 20, 2021 05:08
@eigan
Copy link
Contributor Author

eigan commented Sep 20, 2021

@oleibman Thanks. I will rebase and take a second look at the logic myself. Been a while.

EDIT: Seems like I messed up this pull last time I rebased.

@eigan
Copy link
Contributor Author

eigan commented Sep 20, 2021

The work done in a189d93 messed up my PR. Need to refactor a bit.

@eigan eigan force-pushed the optimize-apply-from-array branch from de8aaef to 70b9682 Compare September 20, 2021 07:05
@eigan
Copy link
Contributor Author

eigan commented Sep 20, 2021

PR updated. You can test the effect by using the snippet from the linked issue.

The effect from 0.9 to 0.3s might not seem huge, but we generate some really large spreadsheets, and this patches reduces time taken by several minutes.

Note: PHPStan bug encountered. $newStyle is actually always defined. Not sure what you guys would prefer to do. Either ignore phpstan message or add if $newStyle === null {. Please let me know.

@oleibman
Copy link
Collaborator

oleibman commented Oct 3, 2021

You have a php-cs-fixer problem as well. I believe the phpstan-var statement at line 81 in Style.php needs to be followed by a blank line (well, a line with just an asterisk in column 6) to eliminate the problem.

As for the phpstan problem, I don't yet understand what you're doing well enough to make a recommendation. In the code before you changed it, newStyle was always assigned a value (as a clone of style). With your change, it is only assigned in the first part of an if ... else ... statement. So, Phpstan's analysis is correct - newStyle may be unassigned. You could eliminate the phpstan problem by moving the assignment to newStyle before the if ... but this might affect your performance numbers if you have to make a lot of unused clones. The other logical alternative is to assign null to newStyle before that if, and then test if newStyle isn't null before using it later. That should not affect your performance much, but I'm not completely convinced that doing nothing in this case is the correct action. It probably is, especially if you can convince me that newStyle will always be non-null here.

@eigan
Copy link
Contributor Author

eigan commented Oct 19, 2021

I don't yet understand what you're doing well enough to make a recommendation

Not sure how I can help you here. The comments is not enough?

if you can convince me that newStyle will always be non-null here.

image

Remember, $newStyle is only used if $existingStyle is falsy.

  1. This is OK. $newStyle is always set, regardless of $existingStyle.
  2. In the else, $newStyle is set if $existingStyle is null.

Now, there is no way you could end up with $existingStyle beeing falsy AND $newStyle not defined. phpstan/phpstan#5805

Note: The code is changed from the one in the picture. In that state, $existingStyle could be false and $existingStyle === null would fail to catch that.

@oleibman
Copy link
Collaborator

oleibman commented Oct 19, 2021

Okay, you've convinced me. You still need to convince Phpstan. Try this just before the addCellXf at the end of your block:

$newStyle = $newStyle ?? new Style();

This will add minimal overhead, and Phpstan should be satisfied.

@eigan
Copy link
Contributor Author

eigan commented Oct 19, 2021

Okay, we could ignore the error, but I will create a suggestion. I think the best approach would be to do same as we do in "1" from the screenshot, so the style is actually applied.

@eigan eigan force-pushed the optimize-apply-from-array branch from f923645 to d173aee Compare October 27, 2021 04:29
@eigan
Copy link
Contributor Author

eigan commented Oct 27, 2021

@oleibman The $newStyle issue should now be fixed.

@eigan eigan force-pushed the optimize-apply-from-array branch 2 times, most recently from 999b701 to 07c8da0 Compare October 29, 2021 05:01
Prevent calling clone and getHashCode when not needed
because these calls are very expensive.

When applying styles to a range of cells can we cache the
styles we encounter along the way so we don't need to look
them up with getHashCode later.
@eigan eigan force-pushed the optimize-apply-from-array branch from 07c8da0 to 78f3b6a Compare October 29, 2021 05:07
@eigan
Copy link
Contributor Author

eigan commented Oct 29, 2021

My own test failed 🤦🏻‍♂️ Fixed now. Rebased on laster master and moved my changelog into "Fixed" instead of "Changed".

src/PhpSpreadsheet/Style/Style.php Outdated Show resolved Hide resolved
@PowerKiKi PowerKiKi merged commit 7635b3f into PHPOffice:master Oct 30, 2021
@PowerKiKi
Copy link
Member

Thank you for your patience. As you probably have guessed by now, there are only three guys on our team, and we all do this in our spare time. So it can take a bit of time until we can process PRs. But quality PR like yours do end up being merged, sooner or later.

On a side note, for your next PR, you might want to consider to break (existing) things in smaller methods. We already have some extremely huge methods, and we'd rather try to avoid that in the future. In this specific case I would probably have written a small class dedicated to cache, to better clarify concerns of each parts, and keep Style.php smaller.

@madhurbhaiya
Copy link

Sorry, but I think that this PR and eventually v1.19 has broken a few things.

Please check the reported issue here: #2366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Possible to optimize Style::applyFromArray()?
5 participants